

## **ESE 345 Computer Architecture**

## Designing a Multicycle Processor Datapath and Control



### **Abstract View of a Single Cycle Processor**





### **Single Cycle Implementation**

- Calculate cycle time assuming negligible delays except:
  - memory, ALU and adders, register file access





## **Worst Case Timing (Load)**



#### Where We are Headed

- Single Cycle Problems:
  - what if we had a more complicated instruction like floating point?
- One Solution:
  - use a "smaller" cycle time
  - have different instructions take different numbers of cycles
  - a "multicycle" datapath:



### **Reducing Cycle Time**

- Cut combinational dependency graph and insert register / latch
- Do same work in two fast cycles, rather than one slow one
- May be able to remove some components for some instructions!







### **Basic Limits on Cycle Time**

- Next address logic
  - PC <= branch ? PC + offset : PC + 4</p>
- Instruction Fetch
  - InstructionReg <= Mem[PC]</li>
- Register Access
  - A <= R[rs]</li>
- ALU operation





### Partitioning the CPI=1 Datapath

Add registers between smallest steps





### **Multicycle Approach 1/2**

- Break up the instructions into steps, each step takes a cycle
  - balance the amount of work to be done
  - restrict each cycle to use only one major functional unit
- At the end of a cycle
  - store values for use in later cycles (easiest thing to do)
  - introduce additional "internal" registers





### **Multicycle Approach 2/2**

- We will be reusing functional units
  - ALU used to compute address and to increment PC
  - Memory used for instruction and data
- Our control signals will not be determined directly by instruction
  - e.g., what should the ALU do for a "subtract" instruction?
- We'll use a finite state machine for control



## Recall: Step-by-step Processor Design

Step 1: ISA => Logical Register Transfers

Step 2: Components of the Datapath

Step 3: RTL + Components => Datapath

Step 4: Datapath + Logical RTs => Physical RTs

Step 5: Physical RTs => Control



#### Instructions from ISA Perspective

- Consider each instruction from perspective of ISA (at the logical register-transfer level).
- Example:
  - The add instruction changes a register.
  - Register specified by bits 15:11 of instruction.
  - Instruction specified by the PC.
  - New value is the sum ("op") of two registers.
  - Registers specified by bits 25:21 and 20:16 of the instruction Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]
  - In order to accomplish this we must break up the instruction.



### **Breaking Down an Instruction**

ISA definition of arithmetic:

Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]]

- Could break down to:
  - IR <= Memory[PC]</li>
  - A <= Reg[IR[25:21]]</li>
  - B <= Reg[IR[20:16]]</li>
  - ALUOut <= A op B</li>
  - Reg[IR[15:11]] <= ALUOut</li>
- And do not forget an important part of the definition of arithmetic!
  - PC <= PC + 4</p>

#### Idea Behind a Multicycle Approach

- We define each instruction from the ISA perspective (logical RTL)
- Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps)
- Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.)
- Finally try and pack as much work into each step
   (avoid unnecessary cycles)
   while also trying to share steps where possible
   (minimizes control, helps to simplify solution)
- Result: Our multicycle Implementation!



### **Multicycle Processor**





### **Five Execution Steps**

- Instruction Fetch
- Instruction Decode and Register Fetch
- Execution, Memory Address Computation, or Branch Completion
- Memory Access or R-type instruction completion
- Write-back step

INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!



### **Step 1: Instruction Fetch**

- Use PC to get instruction and put it in the Instruction Register.
- Increment the PC by 4 and put the result back in the PC.
- Can be described succinctly using RTL "Register-Transfer Language"

```
IR <= Memory[PC];
PC <= PC + 4;</pre>
```

What is the advantage of updating the PC now?

#### Step 2: Instruction Decode and Register Fetch

- Read registers rs and rt in case we need them
- Compute the branch address in case the instruction is a branch
- (Physical) RTL:

```
A <= Reg[IR[25:21]];
B <= Reg[IR[20:16]];
ALUOut <= PC + (sign-extend(IR[15:0]) << 2);
```

 We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)



### **Step 3 (Instruction Dependent)**

- ALU is performing one of three functions, based on instruction type
- Memory Reference:

R-type:

Branch:



#### Step 4 (R-type or Memory-Access) and Write-Back Step 5

- Step 4
- Loads and stores access memory

```
MDR <= Memory[ALUOut];
     or
Memory[ALUOut] <= B;</pre>
```

R-type instructions finish

The write actually takes place at the end of the cycle on the edge

- Write-back step 5
- Reg[IR[20:16]] <= MDR;</li>

Which instruction needs this?



# **Summary:**

| Step name                                                 | Action for R-type<br>instructions                                                            | Action for memory-<br>reference instructions                     | Action for<br>branches      | Action for jumps                         |
|-----------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|-----------------------------|------------------------------------------|
| Instruction fetch                                         | IR <= Memory[PC]<br>PC <= PC + 4                                                             |                                                                  |                             |                                          |
| Instruction decode/register fetch                         | A <= Reg [IR[25:21]]<br>B <= Reg [IR[20:16]]<br>ALUOut <= PC + (sign-extend (IR[15:0]) << 2) |                                                                  |                             |                                          |
| Execution, address computation,<br>branch/jump completion | ALUOut <= A op B                                                                             | ALUOut <= A + sign-extend<br>(IR[15:0])                          | If (A == B)<br>PC <= ALUOut | PC <= {PC [31:28],<br>(IR[25:0]],2'b00)} |
| Memory access or R-type completion                        | Reg [IR[15:11]] <=<br>ALUOut                                                                 | Load: MDR <= Memory[ALUOut]<br>or<br>Store: Memory [ALUOut] <= B |                             |                                          |
| Memory read completion                                    |                                                                                              | Load: Reg[IR[20:16]] <= MDR                                      |                             |                                          |



### **Simple Questions**

How many cycles will it take to execute this code?

Iw \$t2, 0(\$t3) Iw \$t3, 4(\$t3) beq \$t2, \$t3, Label add \$t5, \$t2, \$t3 sw \$t5, 8(\$t3)

#assume not taken

Label:

\_\_\_

- What is going on during the 8th cycle of execution?
- In what cycle does the actual addition of \$t2 and \$t3 takes place?

| Step name                                                 | Action for R-type<br>instructions                                                            | Action for memory-<br>reference instructions                     | Action for<br>branches      | Action for jumps                         |
|-----------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|-----------------------------|------------------------------------------|
| Instruction fetch                                         | IR <= Memory[PC]<br>PC <= PC + 4                                                             |                                                                  |                             |                                          |
| Instruction decode/register fetch                         | A <= Reg [IR[25:21]]<br>B <= Reg [IR[20:16]]<br>ALUOut <= PC + (sign-extend (IR[15:0]) << 2) |                                                                  |                             |                                          |
| Execution, address computation,<br>branch/jump completion | ALUOut <= A op B                                                                             | ALUOut <= A + sign-extend<br>(IR[15:0])                          | If (A == B)<br>PC <= ALUOUT | PC <= {PC [31:28],<br>(IR[25:0]],2'b00)} |
| Memory access or R-type completion                        | Reg [IR[15:11]] <=<br>ALUOut                                                                 | Load: MDR <= Memory[ALUOut]<br>or<br>Store: Memory [ALUOut] <= B |                             |                                          |
| Memory read completion                                    |                                                                                              | Load: Reg[IR[20:16]] <= MDR                                      |                             |                                          |

#### Implementing the Control for a Multicycle Processor

- Value of control signals is dependent upon:
  - what instruction is being executed
  - which step is being performed
- Use the information we've accumulated to specify a finite state machine
  - specify the finite state machine graphically, or
  - use microprogramming
- Implementation can be derived from specification



#### **Review: Finite State Machines**

- Finite state machines:
  - a set of states and
  - next state function (determined by current state and the input)
  - output function (determined by current state and possibly input)



- We'll use a Moore machine for the output function
  - output based only on current state



### **Multicycle**

### **Processor**

**State 0 (IF & PC+4)** 

- lorD =0
- MemRead=1
- IRWrite=1
- ALUSrcA=0
- ALUScrB=01
- ALUop=00(add),
- PCSource =00/
- PCWrite=1

State 1 (IDcd+RF)

- ALUSrcA=0
- ALUScrB=11
- ALUop=00(add)



| Step name                                                 | Action for R-type instructions                                                               | Action for memory-<br>reference instructions                     | Action for<br>branches      | Action for jumps                         |
|-----------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|-----------------------------|------------------------------------------|
| Instruction fetch                                         | IR <= Memory[PC]<br>PC <= PC + 4                                                             |                                                                  |                             |                                          |
| Instruction decode/register fetch                         | A <= Reg [IR[25:21]]<br>B <= Reg [IR[20:16]]<br>ALUOut <= PC + (sign-extend (IR[15:0]) << 2) |                                                                  |                             |                                          |
| Execution, address computation,<br>branch/jump completion | ALUOut <= A op B                                                                             | ALUOut <= A + sign-extend<br>(IR[15:0])                          | If (A == B)<br>PC <= ALUOUT | PC <= {PC [31:28],<br>(IR[25:0]],2'b00)} |
| Memory access or R-type completion                        | Reg [IR[15:11]] <=<br>ALUOut                                                                 | Load: MDR <= Memory[ALUOut]<br>or<br>Store: Memory [ALUOut] <= B |                             |                                          |
| Memory read completion                                    |                                                                                              | Load: Reg[IR[20:16]] <= MDR                                      |                             |                                          |









| Step name                                                 | Action for R-type<br>instructions                                                            | Action for memory-<br>reference instructions                     | Action for<br>branches      | Action for jumps                         |
|-----------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|-----------------------------|------------------------------------------|
| Instruction fetch                                         | IR <= Memory[PC]<br>PC <= PC + 4                                                             |                                                                  |                             |                                          |
| Instruction decode/register fetch                         | A <= Reg [IR[25:21]]<br>B <= Reg [IR[20:16]]<br>ALUOut <= PC + (sign-extend (IR[15:0]) << 2) |                                                                  |                             |                                          |
| Execution, address computation,<br>branch/jump completion | ALUOut <= A op B                                                                             | ALUOut <= A + sign-extend<br>(IR[15:0])                          | If (A == B)<br>PC <= ALUOUT | PC <= {PC [31:28],<br>(IR[25:0]],2'b00)} |
| Memory access or R-type completion                        | Reg [IR[15:11]] <=<br>ALUOut                                                                 | Load: MDR <= Memory[ALUOut]<br>or<br>Store: Memory [ALUOut] <= B |                             |                                          |
| Memory read completion                                    |                                                                                              | Load: Reg[IR[20:16]] <= MDR                                      |                             |                                          |





| Step name                                                 | Action for R-type instructions                                                               | Action for memory-<br>reference instructions                     | Action for<br>branches      | Action for jumps                         |
|-----------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|-----------------------------|------------------------------------------|
| Instruction fetch                                         | IR <= Memory[PC]<br>PC <= PC + 4                                                             |                                                                  |                             |                                          |
| Instruction decode/register fetch                         | A <= Reg [IR[25:21]]<br>B <= Reg [IR[20:16]]<br>ALUOUt <= PC + (sign-extend (IR[15:0]) << 2) |                                                                  |                             |                                          |
| Execution, address computation,<br>branch/jump completion | ALUOut <= A op B                                                                             | ALUOut <= A + sign-extend<br>(IR[15:0])                          | If (A == B)<br>PC <= ALUOUT | PC <= {PC [31:28],<br>(IR[25:0]],2'b00)} |
| Memory access or R-type completion                        | Reg [IR[15:11]] <=<br>ALUOut                                                                 | Load: MDR <= Memory[ALUOut]<br>or<br>Store: Memory [ALUOut] <= B |                             |                                          |
| Memory read completion                                    |                                                                                              | Load: Reg[IR[20:16]] <= MDR                                      |                             |                                          |





| Step name                                                 | Action for R-type instructions                                                               | Action for memory-<br>reference instructions                     | Action for<br>branches      | Action for jumps                         |
|-----------------------------------------------------------|----------------------------------------------------------------------------------------------|------------------------------------------------------------------|-----------------------------|------------------------------------------|
| Instruction fetch                                         | IR <= Memory[PC]<br>PC <= PC + 4                                                             |                                                                  |                             |                                          |
| Instruction decode/register fetch                         | A <= Reg [IR[25:21]]<br>B <= Reg [IR[20:16]]<br>ALUOUt <= PC + (sign-extend (IR[15:0]) << 2) |                                                                  |                             |                                          |
| Execution, address computation,<br>branch/jump completion | ALUOut <= A op B                                                                             | ALUOut <= A + sign-extend<br>(IR[15:0])                          | If (A == B)<br>PC <= ALUOUT | PC <= {PC [31:28],<br>(IR[25:0]],2'b00)} |
| Memory access or R-type completion                        | Reg [IR[15:11]] <=<br>ALUOut                                                                 | Load: MDR <= Memory[ALUOut]<br>or<br>Store: Memory [ALUOut] <= B |                             |                                          |
| Memory read completion                                    |                                                                                              | Load: Reg[IR[20:16]] <= MDR                                      |                             |                                          |



#### **Graphical Specification of FSM**

- Note:
  - don't care if not mentioned
  - asserted if name only
  - otherwise exact value

How many state bits will we need?





#### **Summary**

- If we understand the instructions...
  We can build a simple processor!
- If instructions take different amounts of time, multi-cycle is better
- Datapath implemented using:
  - Combinational logic for arithmetic
  - State holding elements to remember bits
- Control implemented using:
  - Combinational logic for single-cycle implementation
  - Finite state machine for multi-cycle implementation



### **Acknowledgements**

- These slides contain material developed and copyright by:
  - Morgan Kauffmann (Elsevier, Inc.)
  - Arvind (MIT)
  - Krste Asanovic (MIT/UCB)
  - Joel Emer (Intel/MIT)
  - James Hoe (CMU)
  - John Kubiatowicz (UCB)
  - David Patterson (UCB)

